Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] EHN Provide a pos_label parameter in plot_roc_curve #17651

Merged
merged 20 commits into from Jul 6, 2020

Conversation

claramatos
Copy link
Contributor

closes #15573
relates with #17569

Add a pos_label parameter to specify which class to be the positive class when estimator.classes_[1] is not the right choice when using plot_roc_curve.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the refactoring is nice. I would like to have the other 2 PRs in first thought.
Could you add en entry in whats new in 0.24

sklearn/metrics/_plot/base.py Outdated Show resolved Hide resolved
sklearn/metrics/_plot/base.py Outdated Show resolved Hide resolved
sklearn/metrics/_plot/precision_recall_curve.py Outdated Show resolved Hide resolved
sklearn/metrics/_plot/tests/test_plot_roc_curve.py Outdated Show resolved Hide resolved
sklearn/metrics/_plot/tests/test_plot_roc_curve.py Outdated Show resolved Hide resolved
sklearn/metrics/_plot/tests/test_plot_roc_curve.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the refactor of _get_target_scores is nice.

I am +0 on the refactor into a new base class. Is there another visualization we would like to add that would use this new base class?

estimator_name : str, default=None
Name of estimator. If None, then the estimator name is not shown.

pos_label : str or int, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having pos_label here makes this base class less generic. We would only be able to use this to plot curves related to classification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any plot classes you believe should use the CurveDisplay class?

Copy link
Member

@glemaitre glemaitre Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan Do we have curves for regression?
I assume that we could make a trick to find what type of data we handle since we have y and to know if this is a classification problem. If this is not, we can discard pos_label

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be happy with renaming it to ClassificationCurveDisplay

I forgot I had this comment still here.

@glemaitre
Copy link
Member

@claramatos The 2 other PRs have been merged so we can go forward here. You should resolve the conflict and once done, I will look at the tests.

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

Sorry I had to revert #17594 that I merged too quickly (see the discussion at the end for the details).

@claramatos
Copy link
Contributor Author

claramatos commented Jun 27, 2020

Since #17569 is closed can this be closed to? (I've already merged the most recent master version) @glemaitre

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PR is doing three things.

  1. Refactor into CurveDisplay, which I am concerned with. I am thinking of how this plays out when we want to support multiclass with many curves. I didn't do this type of refactor at first because I wanted the ROC and PR curve to evolve independently of each other.
  2. Refactor into _get_target_scores, which I am happy with.
  3. Add pos_label into plot_roc_curve, which I am happy with.

@claramatos Can we make this PR only about 2 and 3 and have 1 for a follow up PR?

estimator_name : str, default=None
Name of estimator. If None, then the estimator name is not shown.

pos_label : str or int, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be happy with renaming it to ClassificationCurveDisplay

I forgot I had this comment still here.

@claramatos
Copy link
Contributor Author

So your suggestion would be to remove the CurveDisplay refactor from this PR? or renaming CurveDisplay to ClassificationCurveDisplay?

@thomasjpfan
Copy link
Member

Sorry for being unclear. I am suggesting to remove CurveDisplay and move forward with pos_label + _get_target_scores. After pos_label + _get_target_scores get merged we can do another PR for the CurveDisplay refactor.

@claramatos
Copy link
Contributor Author

@thomasjpfan I followed your suggestion and removed CurveDisplay

@@ -38,3 +43,72 @@ def _check_classifier_response_method(estimator, response_method):
estimator.__class__.__name__))

return prediction_method


def _get_target_scores(X, estimator, response_method, pos_label=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the word "scores" here may get confused with metrics and scorers. Maybe _get_response is more clear?

sklearn/metrics/_plot/base.py Outdated Show resolved Hide resolved
sklearn/metrics/_plot/precision_recall_curve.py Outdated Show resolved Hide resolved
sklearn/metrics/_plot/tests/test_plot_roc_curve.py Outdated Show resolved Hide resolved
claramatos and others added 4 commits June 30, 2020 22:30
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @claramatos !

@glemaitre
Copy link
Member

Actually this is fine since we don't compute the ROC-AUC via roc_auc_score but using auc directly.
We can merge this PR now.

Thanks @claramatos

@glemaitre glemaitre merged commit c3f2516 into scikit-learn:master Jul 6, 2020
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Jul 17, 2020
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

plot_precision_recall_curve and plot_roc_curve don't allow picking positive class
4 participants